{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating discrete Bayesian Networks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section, we show an example for creating a Bayesian Network in pgmpy from scratch. We use the cancer model (http://www.bnlearn.com/bnrepository/#cancer) for the example. The model structure is shown below.\n", "\n", "In pgmpy, the model structure and it's parametrization (CPDs) doesn't depend on each other. So, the workflow is to first define the model structure, then define all the parameters (CPDs) and then add these parameters to the model. These CPDs can later on be modified, removed, replaced without changing or defining a new model structure." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import Image\n", "\n", "Image(\"images/cancer.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Define the model structure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `BayesianModel` can be initialized by passing a list of edges in the model structure. In this case, there are 4 edges in the model: Pollution -> Cancer, Smoker -> Cancer, Cancer -> Xray, Cancer -> Dyspnoea. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from pgmpy.models import BayesianNetwork\n", "\n", "cancer_model = BayesianNetwork(\n", " [\n", " (\"Pollution\", \"Cancer\"),\n", " (\"Smoker\", \"Cancer\"),\n", " (\"Cancer\", \"Xray\"),\n", " (\"Cancer\", \"Dyspnoea\"),\n", " ]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Define the CPDs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each node of a Bayesian Network has a CPD associated with it, hence we need to define 5 CPDs in this case. In pgmpy, CPDs can be defined using the `TabularCPD` class. For details on the parameters, please refer to the documentation: https://pgmpy.org/_modules/pgmpy/factors/discrete/CPD.html" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from pgmpy.factors.discrete import TabularCPD\n", "\n", "cpd_poll = TabularCPD(variable=\"Pollution\", variable_card=2, values=[[0.9], [0.1]])\n", "cpd_smoke = TabularCPD(variable=\"Smoker\", variable_card=2, values=[[0.3], [0.7]])\n", "cpd_cancer = TabularCPD(\n", " variable=\"Cancer\",\n", " variable_card=2,\n", " values=[[0.03, 0.05, 0.001, 0.02], [0.97, 0.95, 0.999, 0.98]],\n", " evidence=[\"Smoker\", \"Pollution\"],\n", " evidence_card=[2, 2],\n", ")\n", "cpd_xray = TabularCPD(\n", " variable=\"Xray\",\n", " variable_card=2,\n", " values=[[0.9, 0.2], [0.1, 0.8]],\n", " evidence=[\"Cancer\"],\n", " evidence_card=[2],\n", ")\n", "cpd_dysp = TabularCPD(\n", " variable=\"Dyspnoea\",\n", " variable_card=2,\n", " values=[[0.65, 0.3], [0.35, 0.7]],\n", " evidence=[\"Cancer\"],\n", " evidence_card=[2],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Add the CPDs to the model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After defining the model parameters, we can now add them to the model using `add_cpds` method. The `check_model` method can also be used to verify if the CPDs are correctly defined for the model structure." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Associating the parameters with the model structure.\n", "cancer_model.add_cpds(cpd_poll, cpd_smoke, cpd_cancer, cpd_xray, cpd_dysp)\n", "\n", "# Checking if the cpds are valid for the model.\n", "cancer_model.check_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Run basic operations on the model" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False\n", "True\n" ] } ], "source": [ "# Check for d-separation between variables\n", "print(cancer_model.is_dconnected(\"Pollution\", \"Smoker\"))\n", "print(cancer_model.is_dconnected(\"Pollution\", \"Smoker\", observed=[\"Cancer\"]))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Pollution': {'Cancer', 'Dyspnoea', 'Pollution', 'Xray'}}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get all d-connected nodes\n", "\n", "cancer_model.active_trail_nodes(\"Pollution\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(Xray ⟂ Smoker, Pollution, Dyspnoea | Cancer)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# List local independencies for a node\n", "\n", "cancer_model.local_independencies(\"Xray\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(Xray ⟂ Smoker, Pollution, Dyspnoea | Cancer)\n", "(Xray ⟂ Pollution, Dyspnoea | Smoker, Cancer)\n", "(Xray ⟂ Smoker, Dyspnoea | Pollution, Cancer)\n", "(Xray ⟂ Smoker, Pollution | Cancer, Dyspnoea)\n", "(Xray ⟂ Dyspnoea | Smoker, Pollution, Cancer)\n", "(Xray ⟂ Pollution | Smoker, Cancer, Dyspnoea)\n", "(Xray ⟂ Smoker | Pollution, Cancer, Dyspnoea)\n", "(Smoker ⟂ Pollution)\n", "(Smoker ⟂ Xray, Dyspnoea | Cancer)\n", "(Smoker ⟂ Xray, Dyspnoea | Pollution, Cancer)\n", "(Smoker ⟂ Dyspnoea | Xray, Cancer)\n", "(Smoker ⟂ Xray | Cancer, Dyspnoea)\n", "(Smoker ⟂ Dyspnoea | Pollution, Xray, Cancer)\n", "(Smoker ⟂ Xray | Pollution, Cancer, Dyspnoea)\n", "(Pollution ⟂ Smoker)\n", "(Pollution ⟂ Xray, Dyspnoea | Cancer)\n", "(Pollution ⟂ Xray, Dyspnoea | Smoker, Cancer)\n", "(Pollution ⟂ Dyspnoea | Xray, Cancer)\n", "(Pollution ⟂ Xray | Cancer, Dyspnoea)\n", "(Pollution ⟂ Dyspnoea | Smoker, Xray, Cancer)\n", "(Pollution ⟂ Xray | Smoker, Cancer, Dyspnoea)\n", "(Dyspnoea ⟂ Smoker, Pollution, Xray | Cancer)\n", "(Dyspnoea ⟂ Pollution, Xray | Smoker, Cancer)\n", "(Dyspnoea ⟂ Smoker, Xray | Pollution, Cancer)\n", "(Dyspnoea ⟂ Smoker, Pollution | Xray, Cancer)\n", "(Dyspnoea ⟂ Xray | Smoker, Pollution, Cancer)\n", "(Dyspnoea ⟂ Pollution | Smoker, Xray, Cancer)\n", "(Dyspnoea ⟂ Smoker | Pollution, Xray, Cancer)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get all model implied independence conditions\n", "\n", "cancer_model.get_independencies()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading example models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To quickly try out different features, pgmpy also has the functionality to directly load some example models from the bnlearn repository." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nodes in the model: ['Pollution', 'Smoker', 'Cancer', 'Xray', 'Dyspnoea']\n", "Edges in the model: [('Pollution', 'Cancer'), ('Smoker', 'Cancer'), ('Cancer', 'Xray'), ('Cancer', 'Dyspnoea')]\n" ] }, { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ,\n", " ]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pgmpy.utils import get_example_model\n", "\n", "model = get_example_model(\"cancer\")\n", "print(\"Nodes in the model:\", model.nodes())\n", "print(\"Edges in the model:\", model.edges())\n", "model.get_cpds()" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 1 }